首页> 外文OA文献 >Optimal symbol alignment distance: a new distance for sequences of symbols
【2h】

Optimal symbol alignment distance: a new distance for sequences of symbols

机译:最佳符号对齐距离:符号序列的新距离

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Comparison functions for sequences (of symbols) are important components of many applications, for example clustering, data cleansing and integration. For years, many efforts have been made to improve the performance of such comparison functions. Improvements have been done either at the cost of reducing the accuracy of the comparison, or by compromising certain basic characteristics of the functions, such as the triangular inequality. In this paper, we propose a new distance for sequences of symbols (or strings) called Optimal Symbol Alignment distance (OSA distance, for short). This distance has a very low cost in practice, which makes it a suitable candidate for computing distances in applications with large amounts of (very long) sequences. After providing a mathematical proof that the OSA distance is a real distance, we present some experiments for different scenarios (DNA sequences, record linkage, ...), showing that the proposed distance outperforms, in terms of execution time and/or accuracy, other well-known comparison functions such as the Edit or Jaro-Winkler distances.
机译:(符号的)序列的比较功能是许多应用程序的重要组件,例如集群,数据清理和集成。多年来,已经做出了很多努力来改善这种比较功能的性能。已经进行了改进,要么以降低比较的准确性为代价,要么以损害函数的某些基本特征(例如三角形不等式)为代价。在本文中,我们为符号(或字符串)序列提出了一个新的距离,称为最佳符号对齐距离(简称OSA距离)。在实践中,此距离的成本非常低,这使其非常适合在具有大量(非常长)序列的应用中计算距离。在提供OSA距离是真实距离的数学证明之后,我们提出了针对不同场景(DNA序列,记录链接等)的一些实验,表明在执行时间和/或准确性方面,拟议的距离优于性能,其他众所周知的比较功能,例如“编辑”或“ Jaro-Winkler距离”。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号